build scrapper passed #73

heytulsiprasad · 2019-12-21T16:00:40Z

Description

This PR is WIP (work in progress), focused to solve #68 . The scraper.js built within router directory contains the code to fetch data from zairza.blog.in. It fetches title, href, author, release-date, and cover-img-link of all the blog posts and stores them as objects.

Dependencies Added

request
cheerio

Work remaining todo:

Build: scraper to fetch contents from zairza.blog.in and are stored as objects
Fix: above mentioned issue
Make this code accessible through the app.js file
Create a template in blog.ejs for rendering the blog sections
Run a forEach method on blogs.json at the end to fetch scraped data to our website

All suggestions are appreciated. 👍

scraper.js scraps from blog.zairza.in info regarding all blog posts

build a scrapper with details to fetch into blogs.ejs

layout of blogs section is made responsive and date mentioned in blog cards is rendered using regex syntax

heytulsiprasad · 2019-12-21T16:06:07Z

Feature Display

Concerns

background image zoom out
blog title moved a little right (unaligned with author name)
date format yyyy/mm/dd which was originally dd/mm/yyyy

Help and suggestions on these above mentioned concerns is appreciated. 👍

date used in blog cards are now of the format DD/MM/YYYY

the latest post from medium is now also fetched and added to data.json file

ankitjena

@tulsi-prasad Good work, try if what I mentioned above can be done

ankitjena · 2019-12-23T15:46:38Z

routes/scraper.js

@@ -0,0 +1,105 @@
+// The scraper for the blog.ejs section in the application. 


Where do you run this file?

ankitjena · 2019-12-23T15:48:20Z

routes/data.json

+        "href": "https://blog.zairza.in/oauth-using-mevn-stack-4b4a383dae08?source=collection_home---6------0-----------------------",
+        "author": "Ramakrishna Pattnaik",
+        "release": "2019-08-25T12:13:49.122Z",
+        "cover": "https://cdn-images-1.medium.com/fit/t/1600/480/1*zqCh8ZNR-LjBzaacpiIyUA.png"


Since you are fetching the cover image, it's huge. Which is why it's zoomed out. We need the image which the first image inside the blog. WDYT? Can this be done?

Yes, I think so. We need to scrape every hrefs of particular blog to get the right image. Working on it now.

Mostly we'll do this step for 4 of the recent posts for optimization.

yeah, that should reduce unnecessary requests

routes/index.js

add moment js as a dependency to work with date time objects

write datetime objects to be rendered using moment package

Scrapes the first image from each blog posts and forms a cover object.

added cover objects with img urls in cover.json file of first 4 blogs

heytulsiprasad · 2019-12-25T05:10:33Z

Work to Do

Fetch the cover image urls from each individual blogs (first-image). For this purpose is taken care of in 567dc39. And also next commit adds them to cover.json file for convinience.

Problems now facing

The order of scraped image urls is not according to the blog posts order. This seems to appear out of nowhere, as the while loop iterates from count = 0 and which fetches from data array which is also ordered. After fixing this only, we can render them on ejs template.

Fix to Try

I am thinking of making a different coverScraper.js file which imports from data.json as it is and loops through its Top 4 hrefs and fetch the first image. I'll update on this by tonight.

EDIT 1

Refactored the code in the next commit, 2b837e5. ./json/cover.json stores cover image urls, which are still not in order. After every surver run, the order of json file changes. Reason still doubtful.

Any suggestions are appreciated. 👍

json folder stores the scraped data and scrapcover is fetches cover images from each blogs

bloglinks array contains four urls in order to be scraped for cover image

ankitjena · 2019-12-26T03:26:32Z

I am thinking of making a different coverScraper.js file which imports from data.json as it is and loops through its Top 4 hrefs and fetch the first image

You should keep the entire logic in one file. When you fetch the blog url for cover image, do another request using cheerio. async/await will help.

heytulsiprasad · 2019-12-30T13:36:49Z

This PR is taken further in #74 to avoid any local conflicts.

heytulsiprasad · 2020-02-07T13:49:42Z

Have worked on a new branch. Will update in a another PR.

heytulsiprasad added 6 commits December 8, 2019 13:27

add ext links to achievements

add8589

project card rotate animation

44e20c5

⚡ chore: build scraper for blog.ejs

4bc828b

scraper.js scraps from blog.zairza.in info regarding all blog posts

✨ feat: build scrapper success

8c4a020

build a scrapper with details to fetch into blogs.ejs

🎨 chore: add ejs template to blogs.ejs

4ef9569

📱 feat: make responsive layout+ add regex

076f255

layout of blogs section is made responsive and date mentioned in blog cards is rendered using regex syntax

heytulsiprasad added 2 commits December 22, 2019 09:35

👌 style: date format change to DD/MM/YYYY

724e04c

date used in blog cards are now of the format DD/MM/YYYY

💥 feat: fetch latest blog details

b1b1841

the latest post from medium is now also fetched and added to data.json file

ankitjena requested changes Dec 23, 2019

View reviewed changes

heytulsiprasad added 4 commits December 24, 2019 02:19

➕ add moment as dependency

f821ad6

add moment js as a dependency to work with date time objects

👌 update index.js with moment pkg

04a095a

write datetime objects to be rendered using moment package

⚡ build: scrapers for individual blog posts

567dc39

Scrapes the first image from each blog posts and forms a cover object.

🚧 chore: add cover objects in json

33f8547

added cover objects with img urls in cover.json file of first 4 blogs

heytulsiprasad added 2 commits December 25, 2019 15:45

♻️ refactor: scraper codes and json

2b837e5

json folder stores the scraped data and scrapcover is fetches cover images from each blogs

⚗️ make an array of four urls to scrap

ba01f29

bloglinks array contains four urls in order to be scraped for cover image

heytulsiprasad closed this Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build scrapper passed #73

build scrapper passed #73

heytulsiprasad commented Dec 21, 2019

heytulsiprasad commented Dec 21, 2019 •

edited

Loading

ankitjena left a comment

ankitjena Dec 23, 2019

ankitjena Dec 23, 2019

heytulsiprasad Dec 23, 2019

heytulsiprasad Dec 23, 2019

ankitjena Dec 24, 2019

heytulsiprasad commented Dec 25, 2019 •

edited

Loading

ankitjena commented Dec 26, 2019

heytulsiprasad commented Dec 30, 2019

heytulsiprasad commented Feb 7, 2020

		@@ -0,0 +1,105 @@
		// The scraper for the blog.ejs section in the application.

build scrapper passed #73

build scrapper passed #73

Conversation

heytulsiprasad commented Dec 21, 2019

Description

Dependencies Added

Work remaining todo:

All suggestions are appreciated. 👍

heytulsiprasad commented Dec 21, 2019 • edited Loading

Feature Display

Concerns

Help and suggestions on these above mentioned concerns is appreciated. 👍

ankitjena left a comment

Choose a reason for hiding this comment

ankitjena Dec 23, 2019

Choose a reason for hiding this comment

ankitjena Dec 23, 2019

Choose a reason for hiding this comment

heytulsiprasad Dec 23, 2019

Choose a reason for hiding this comment

heytulsiprasad Dec 23, 2019

Choose a reason for hiding this comment

ankitjena Dec 24, 2019

Choose a reason for hiding this comment

heytulsiprasad commented Dec 25, 2019 • edited Loading

Work to Do

Problems now facing

Fix to Try

EDIT 1

Any suggestions are appreciated. 👍

ankitjena commented Dec 26, 2019

heytulsiprasad commented Dec 30, 2019

heytulsiprasad commented Feb 7, 2020

heytulsiprasad commented Dec 21, 2019 •

edited

Loading

heytulsiprasad commented Dec 25, 2019 •

edited

Loading